encode to utf-8 instead of latin-1 #15

spock · 2023-11-13T00:03:38Z

to add support for e.g. Cyrillic characters in file names; untested, let's see if CI/CD passes :)

related to #13

codecov · 2023-11-13T15:37:34Z

Codecov Report

All modified and coverable lines are covered by tests ✅

see 5 files with indirect coverage changes

📢 Thoughts on this report? Let us know!

lrq3000 · 2023-11-13T15:40:39Z

Thank you very much for your PR! Unfortunately it fails to build, but there is hope, because more than half of the tests pass, so this suggests that with minor rewrites to the tests it may work fine again. I'll have a look.

lrq3000 · 2024-11-20T10:35:59Z

I did not reply before but I did look into this and this involved more than minor rewrites for the remaining tests to pass.

I will eventually come back to it when I'll have time, and maybe AI is advanced enough now to actuallu be useful for this kind of bug fixing.

Copilot

Pull Request Overview

This PR changes the character encoding from latin-1 to utf-8 in the b() compatibility function to enable support for Cyrillic and other non-ASCII characters in file names.

Key Changes:

Updated b() function to use utf_8_encode instead of latin_1_encode for Python 3 string-to-bytes conversion

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-11-13T11:24:23Z

pyFileFixity/lib/_compat.py

    def b(x):
        if isinstance(x, _str):
-            return codecs.latin_1_encode(x)[0]
+            return codecs.utf_8_encode(x)[0]


The encoding has been changed from latin-1 to utf-8 in the b() function, but the _bytes() function on line 66 still uses 'latin-1' encoding. For consistency and to properly support UTF-8 characters (like Cyrillic), this should also be updated to 'utf-8'.

Note that this change also requires updating corresponding decode('latin-1') calls elsewhere in the codebase (e.g., in header_ecc.py line 603 and structural_adaptive_ecc.py line 657) to use decode('utf-8') instead, to ensure proper round-trip encoding/decoding of file paths with non-ASCII characters.

@copilot open a new pull request to apply changes based on this feedback

lrq3000 · 2025-11-13T11:25:42Z

@codex

encode to utf-8 instead of latin-1

801bbe4

lrq3000 requested a review from Copilot November 13, 2025 11:21

Copilot started reviewing on behalf of lrq3000 November 13, 2025 11:21 View session

Copilot finished reviewing on behalf of lrq3000 November 13, 2025 11:23

Copilot AI reviewed Nov 13, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

encode to utf-8 instead of latin-1 #15

encode to utf-8 instead of latin-1 #15

Uh oh!

spock commented Nov 13, 2023 •

edited

Loading

Uh oh!

codecov bot commented Nov 13, 2023 •

edited

Loading

Uh oh!

lrq3000 commented Nov 13, 2023 •

edited

Loading

Uh oh!

lrq3000 commented Nov 20, 2024

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Nov 13, 2025

Uh oh!

lrq3000 Nov 13, 2025

Uh oh!

lrq3000 commented Nov 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

encode to utf-8 instead of latin-1 #15

Are you sure you want to change the base?

encode to utf-8 instead of latin-1 #15

Uh oh!

Conversation

spock commented Nov 13, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Nov 13, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

lrq3000 commented Nov 13, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lrq3000 commented Nov 20, 2024

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Copilot AI Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

lrq3000 Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

lrq3000 commented Nov 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

spock commented Nov 13, 2023 •

edited

Loading

codecov bot commented Nov 13, 2023 •

edited

Loading

lrq3000 commented Nov 13, 2023 •

edited

Loading